Univariate Plots Section
## [1] 1599 13
## 'data.frame': 1599 obs. of 13 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1.0 Min. : 4.60 Min. :0.1200 Min. :0.000
## 1st Qu.: 400.5 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090
## Median : 800.0 Median : 7.90 Median :0.5200 Median :0.260
## Mean : 800.0 Mean : 8.32 Mean :0.5278 Mean :0.271
## 3rd Qu.:1199.5 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420
## Max. :1599.0 Max. :15.90 Max. :1.5800 Max. :1.000
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.900 Min. :0.01200 Min. : 1.00
## 1st Qu.: 1.900 1st Qu.:0.07000 1st Qu.: 7.00
## Median : 2.200 Median :0.07900 Median :14.00
## Mean : 2.539 Mean :0.08747 Mean :15.87
## 3rd Qu.: 2.600 3rd Qu.:0.09000 3rd Qu.:21.00
## Max. :15.500 Max. :0.61100 Max. :72.00
## total.sulfur.dioxide density pH sulphates
## Min. : 6.00 Min. :0.9901 Min. :2.740 Min. :0.3300
## 1st Qu.: 22.00 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500
## Median : 38.00 Median :0.9968 Median :3.310 Median :0.6200
## Mean : 46.47 Mean :0.9967 Mean :3.311 Mean :0.6581
## 3rd Qu.: 62.00 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300
## Max. :289.00 Max. :1.0037 Max. :4.010 Max. :2.0000
## alcohol quality
## Min. : 8.40 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.20 Median :6.000
## Mean :10.42 Mean :5.636
## 3rd Qu.:11.10 3rd Qu.:6.000
## Max. :14.90 Max. :8.000
This dataset consists of 13 variables with 1599 observations.

## 75%
## 12.35
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.60 7.10 7.90 8.32 9.20 15.90
Most wine have a fixed acidity between 6 and 11. There is a slight skew to the right with most wines having a fixed acidity of 11 or lower. There are multiple outliers in this distribution where the wines have fixed acidities higher than 12.35.

## 75%
## 1.015
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3900 0.5200 0.5278 0.6400 1.5800
The wine with the lowest volatile acidity has a score of 0.12 and the highest has 1.58. Above, I plot the main body of volatile acidity, trimming those with the highest levels. There appears to be a bi modal distribution. There are multiple outliers in this distribution where the wines have volatile acidities higher than 1.015.



## 75%
## 0.915
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.090 0.260 0.271 0.420 1.000
##
## 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14
## 132 33 50 30 29 20 24 22 33 30 35 15 27 18 21
## 0.15 0.16 0.17 0.18 0.19 0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
## 19 9 16 22 21 25 33 27 25 51 27 38 20 19 21
## 0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4 0.41 0.42 0.43 0.44
## 30 30 32 25 24 13 20 19 14 28 29 16 29 15 23
## 0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59
## 22 19 18 23 68 20 13 17 14 13 12 8 9 9 8
## 0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.74
## 9 2 1 10 9 7 14 2 11 4 2 1 1 3 4
## 0.75 0.76 0.78 0.79 1
## 1 3 1 1 1
I transformed the long tail data to better understand the distribution of citric acidity. The wine with the lowest citric acid has a score of 0 while the highest has a score of 1. Most values lie between 0.09 and 0.42. The wine with the citric acid level of 1 is the only outlier in this distribution.



## 75%
## 3.65
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
Most wines have a residual sugar level value between 1.9 and 2.6. I plotted those values that lie in this range and they appear to be normally distributed. The wine with the lowest residual sugar has a value of 0.9 and the highest has a score of 15.5. There are multiple outliers in this distribution where the wines have residual sugar levels higher than 3.65.



## 75%
## 0.12
## 25%
## 0.04
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100
The chloride levels for the wines in the dataset range from 0.012 to 0.611. Most values lie between 0.07 and 0.09. I plotted the chloride levels for wines that had values within this range. These values appear to have a normal distribution. There are multiple outliers in this distribution where the wines have chloride levels higher than 0.12. There are also a handful of wines in the lower end of the distribution that are outliers with chloride levels less than 0.04.



## 75%
## 42
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 7.00 14.00 15.87 21.00 72.00
## free.sulfur.dioxide
## 1 2 3 4 5 5.5 6 7 8 9 10 11 12 13 14
## 3 1 49 41 104 1 138 71 56 62 79 59 75 57 50
## 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## 78 61 60 46 39 30 41 22 32 34 24 32 29 23 23
## 30 31 32 33 34 35 36 37 37.5 38 39 40 40.5 41 42
## 16 20 22 11 18 15 11 3 2 9 5 6 1 7 3
## 43 45 46 47 48 50 51 52 53 54 55 57 66 68 72
## 3 3 1 1 4 2 4 3 1 1 2 1 1 2 1
Transforming the plot of free sulfur dioxide reveals a bi modal distribution. Most wines have values between 7 and 21. The lowest value is 1 and the highest is 72. There are multiple outliers in this distribution where the wines have free sulfur dioxide levels higher than 42.



## 75%
## 122
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 22.00 38.00 46.47 62.00 289.00
## total.sulfur.dioxide
## 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## 3 4 14 14 27 26 29 28 33 35 26 27 35 29 33
## 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
## 25 25 34 36 27 24 30 43 20 14 32 20 17 20 26
## 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## 12 26 31 16 17 14 26 18 23 20 17 24 21 21 11
## 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
## 11 15 14 20 13 10 6 14 9 18 9 9 13 10 17
## 66 67 68 69 70 71 72 73 74 75 76 77 77.5 78 79
## 9 12 10 8 8 7 10 7 8 5 3 8 2 4 5
## 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
## 4 6 4 2 6 9 10 6 14 9 5 7 8 2 8
## 95 96 98 99 100 101 102 103 104 105 106 108 109 110 111
## 4 5 7 6 3 4 6 2 5 5 6 3 4 6 3
## 112 113 114 115 116 119 120 121 122 124 125 126 127 128 129
## 3 4 2 2 1 7 2 4 3 3 2 1 2 2 3
## 130 131 133 134 135 136 139 140 141 142 143 144 145 147 148
## 1 3 3 2 2 2 1 1 3 1 2 3 3 3 2
## 149 151 152 153 155 160 165 278 289
## 1 2 1 1 1 1 1 1 1
Total sulfur dioxide levels range between 6 and 289. Most values lie between 22 and 62. The maximum value is 289 and the minimum is 6. There are multiple outliers in this distribution where the wines have total sulfur dioxide levels higher than 122.
## 75%
## 1.001187
## 25%
## 0.9922475

## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9901 0.9956 0.9968 0.9967 0.9978 1.0040
The density of the wine’s in the dataset range from 0.9901 to 1.004. Most wines lie between 0.9956 and 0.9978. There are multiple outliers in this distribution where the wines have densities that are either higher than 1.001187 or lower than 0.9922475.


## 75%
## 3.685
## 25%
## 2.925
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.740 3.210 3.310 3.311 3.400 4.010
## pH
## 2.74 2.86 2.87 2.88 2.89 2.9 2.92 2.93 2.94 2.95 2.98 2.99 3 3.01 3.02
## 1 1 1 2 4 1 4 3 4 1 5 2 6 5 8
## 3.03 3.04 3.05 3.06 3.07 3.08 3.09 3.1 3.11 3.12 3.13 3.14 3.15 3.16 3.17
## 6 10 8 10 11 11 11 19 9 20 13 21 34 36 27
## 3.18 3.19 3.2 3.21 3.22 3.23 3.24 3.25 3.26 3.27 3.28 3.29 3.3 3.31 3.32
## 30 25 39 36 39 32 29 26 53 35 42 46 57 39 45
## 3.33 3.34 3.35 3.36 3.37 3.38 3.39 3.4 3.41 3.42 3.43 3.44 3.45 3.46 3.47
## 37 43 39 56 37 48 48 37 34 33 17 29 20 22 21
## 3.48 3.49 3.5 3.51 3.52 3.53 3.54 3.55 3.56 3.57 3.58 3.59 3.6 3.61 3.62
## 19 10 14 15 18 17 16 8 11 10 10 8 7 8 4
## 3.63 3.66 3.67 3.68 3.69 3.7 3.71 3.72 3.74 3.75 3.78 3.85 3.9 4.01
## 3 4 3 5 4 1 4 3 1 1 2 1 2 2
The pH levels of wine range from 2.74 to 4.01. Most have values between 3.21 and 3.4. There are multiple outliers in this distribution where the wines have pH levels that are either higher than 3.685 or lower than 2.925.



## 75%
## 1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3300 0.5500 0.6200 0.6581 0.7300 2.0000
## sulphates
## 0.33 0.37 0.39 0.4 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.52
## 1 2 6 4 5 8 16 12 18 19 29 31 27 26 47
## 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67
## 51 68 50 60 55 68 51 69 45 61 48 46 41 42 36
## 0.68 0.69 0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.8 0.81 0.82
## 35 23 33 26 28 26 26 20 25 26 23 18 19 15 22
## 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97
## 15 13 14 13 13 7 7 8 8 5 10 4 2 3 6
## 0.98 0.99 1 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.1 1.11 1.12
## 2 3 1 1 3 2 2 3 4 2 3 1 2 1 1
## 1.13 1.14 1.15 1.16 1.17 1.18 1.2 1.22 1.26 1.28 1.31 1.33 1.34 1.36 1.56
## 2 2 1 1 5 3 1 1 1 2 1 1 1 3 1
## 1.59 1.61 1.62 1.95 1.98 2
## 1 1 1 2 1 1
Sulphate levels range from 0.33 to 2.00. Most values lie between 0.55 and 0.73. There is a right skew in the distribution of sulphate levels. There are multiple outliers in this distribution where the wines have sulphate levels that are higher than 1.
## 75%
## 13.5

## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
Most wines have alcohol levels between 9.5 and 11.1. The minimum is 8.4 and the maximum value is 14.9. There is a right skew in the distribution of alcohol levels. There are multiple outliers in this distribution where the wines have alcohol levels that are higher than 13.5.

## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.636 6.000 8.000
## quality
## 3 4 5 6 7 8
## 10 53 681 638 199 18
The quality ratings of wine range from 3 to 8. Most wines having a rating of 5 or 6.
Bivariate Plots Section
Pair Plot

A pair-plot of the variables in the dataset reveals that alcohol, volatile acidity, and citric acid seem to have a strong relationship with the quality of a wine. I will explore these relationships further.

## wine$binary.quality: Bad
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1800 0.4600 0.5900 0.5895 0.6800 1.5800
## --------------------------------------------------------
## wine$binary.quality: Good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3500 0.4600 0.4741 0.5800 1.0400

The boxplot above shows that the variance for volatile acidity in good and bad wines is similar. Bad wines have a higher mean and median. The mean and median volatile acidity for good wines was 0.4741 and 0.46 respectively. The mean and median volatile acidity for bad wines was 0.5895 and 0.59 respectively. A frequency polygon of volatile acidity by wine quality shows that the higher the volatile acidity, the more likely a wine is to be of bad quality. Most wines above a volatile acidity of 0.6 were of bad quality.

## wine$binary.quality: Bad
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3300 0.5200 0.5800 0.6185 0.6500 2.0000
## --------------------------------------------------------
## wine$binary.quality: Good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3900 0.5900 0.6600 0.6926 0.7700 1.9500

Sulphate levels also vary between wines of different qualities. Good quality wines had higher mean and median sulphate levels. The mean and median values for good wines were 0.6926 and 0.66 respectively. Wines of poor quality had mean and median sulphate levels of 0.6185 and 0.58 respectively. A frequency polygon of sulphate levels by wine quality reveals that those with higher levels have a greater likelihood of being good wines. Wines that had sulphate levels of 0.63 or higher tend to be of good quality.

## wine$binary.quality: Bad
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.400 9.400 9.700 9.926 10.300 14.900
## --------------------------------------------------------
## wine$binary.quality: Good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 10.00 10.80 10.86 11.70 14.00

Alcohol content varies depending on the quality of the wine. Wines of good quality have a higher variance as well as a higher mean and median. The mean and median alcohol content for good wines was 10.86 and 10.8 respectively. Bad wines had a mean alcohol content of 9.926 and a median of 9.7. A frequency polygon of alcohol content by wine quality shows that wines with alcohol levels above 10.25 have a higher likelihood of being of good quality.

## wine$binary.quality: Bad
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 23.75 45.00 54.65 78.00 155.00
## --------------------------------------------------------
## wine$binary.quality: Good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 20.00 33.00 39.35 50.00 289.00

Wines of poor quality have a larger variance in total sulfur dioxide levels as well as a higher mean and median. Wines of bad quality have a median of 45 and a mean of 54.65. Wines of good quality have a mean of 39.35 and a median of 33. A plot of the frequency of total sulfur dioxide levels by wine quality suggests that wines with levels above 80 are more likely to be of bad quality.

## wine$binary.quality: Bad
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0800 0.2200 0.2378 0.3600 1.0000
## --------------------------------------------------------
## wine$binary.quality: Good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1150 0.3100 0.2999 0.4600 0.7800

A box plot of the citric acid content of wines by quality reveals that good wines have a higher median and variance than bad wines. Good wines also have a higher mean citric acid. The mean and median levels for good wines are 0.46 and 0.31 respectively. Bad wines have a mean of 0.2378 and a median of 0.22. Plotting the frequency of citric acid levels by wine quality suggests that good wines have higher citric acid levels. One notable aspect of this data is that it does not show that at high levels, citric acid can affect the quality of the wine. Citric acid can increase the formation of volatile acid. One would expect that there would be a resurgence of bad wines at higher levels.